375 research outputs found

    Microarchitectural Low-Power Design Techniques for Embedded Microprocessors

    Get PDF
    With the omnipresence of embedded processing in all forms of electronics today, there is a strong trend towards wireless, battery-powered, portable embedded systems which have to operate under stringent energy constraints. Consequently, low power consumption and high energy efficiency have emerged as the two key criteria for embedded microprocessor design. In this thesis we present a range of microarchitectural low-power design techniques which enable the increase of performance for embedded microprocessors and/or the reduction of energy consumption, e.g., through voltage scaling. In the context of cryptographic applications, we explore the effectiveness of instruction set extensions (ISEs) for a range of different cryptographic hash functions (SHA-3 candidates) on a 16-bit microcontroller architecture (PIC24). Specifically, we demonstrate the effectiveness of light-weight ISEs based on lookup table integration and microcoded instructions using finite state machines for operand and address generation. On-node processing in autonomous wireless sensor node devices requires deeply embedded cores with extremely low power consumption. To address this need, we present TamaRISC, a custom-designed ISA with a corresponding ultra-low-power microarchitecture implementation. The TamaRISC architecture is employed in conjunction with an ISE and standard cell memories to design a sub-threshold capable processor system targeted at compressed sensing applications. We furthermore employ TamaRISC in a hybrid SIMD/MIMD multi-core architecture targeted at moderate to high processing requirements (> 1 MOPS). A range of different microarchitectural techniques for efficient memory organization are presented. Specifically, we introduce a configurable data memory mapping technique for private and shared access, as well as instruction broadcast together with synchronized code execution based on checkpointing. We then study an inherent suboptimality due to the worst-case design principle in synchronous circuits, and introduce the concept of dynamic timing margins. We show that dynamic timing margins exist in microprocessor circuits, and that these margins are to a large extent state-dependent and that they are correlated to the sequences of instruction types which are executed within the processor pipeline. To perform this analysis we propose a circuit/processor characterization flow and tool called dynamic timing analysis. Moreover, this flow is employed in order to devise a high-level instruction set simulation environment for impact-evaluation of timing errors on application performance. The presented approach improves the state of the art significantly in terms of simulation accuracy through the use of statistical fault injection. The dynamic timing margins in microprocessors are then systematically exploited for throughput improvements or energy reductions via our proposed instruction-based dynamic clock adjustment (DCA) technique. To this end, we introduce a 6-stage 32-bit microprocessor with cycle-by-cycle DCA. Besides a comprehensive design flow and simulation environment for evaluation of the DCA approach, we additionally present a silicon prototype of a DCA-enabled OpenRISC microarchitecture fabricated in 28 nm FD-SOI CMOS. The test chip includes a suitable clock generation unit which allows for cycle-by-cycle DCA over a wide range with fine granularity at frequencies exceeding 1 GHz. Measurement results of speedups and power reductions are provided

    Investigating the Potential of Custom Instruction Set Extensions for SHA-3 Candidates on a 16-bit Microcontroller Architecture

    Get PDF
    In this paper, we investigate the benefit of instruction set extensions for software implementations of all five SHA-3 candidates. To this end, we start from optimized assembly code for a common 16-bit microcontroller instruction set architecture. By themselves, these implementations provide reference for complexity of the algorithms on 16-bit architectures, commonly used in embedded systems. For each algorithm, we then propose suitable instruction set extensions and implement the modified processor core. We assess the gains in throughput, memory consumption, and the area overhead. Our results show that with less than 10% additional area, it is possible to increase the execution speed on average by almost 40%, while reducing memory requirements on average by more than 40%. In particular, the Grostl algorithm, which was one of the slowest algorithms in previous reference implementations, ends up being the fastest implementation by some margin, once minor (but dedicated) instruction set extensions are taken into account

    A Wireless Body Sensor Network For Activity Monitoring With Low Transmission Overhead

    Get PDF
    Activity recognition has been a research field of high interest over the last years, and it finds application in the medical domain, as well as personal healthcare monitoring during daily home- and sports-activities. With the aim of producing minimum discomfort while performing supervision of subjects, miniaturized networks of low-power wireless nodes are typically deployed on the body to gather and transmit physiological data, thus forming a Wireless Body Sensor Network (WBSN). In this work, we propose a WBSN for online activity monitoring, which combines the sensing capabilities of wearable nodes and the high computational resources of modern smartphones. The proposed solution provides different tradeoffs between classification accuracy and energy consumption, thanks to different workloads assigned to the nodes and to the mobile phone in different network configurations. In particular, our WBSN is able to achieve very high activity recognition accuracies (up to 97.2%) on multiple subjects, while significantly reducing the sampling frequency and the volume of transmitted data with respect to other state-of-the art solutions

    DynOR: A 32-bit Microprocessor in 28 nm FD-SOI with Cycle-By-Cycle Dynamic Clock Adjustment

    Get PDF
    This paper presents DynOR, a 32-bit 6-stage OpenRISC microprocessor with dynamic clock adjustment. To alleviate the issue of unused dynamic timing margins, the clock period of the processor is adjusted on a cycle-by-cycle level, based on the instruction types currently in flight in the pipeline. To this end, we employ a custom designed clock generation unit, capable of immediate glitch-free adjustments of the clock period over a wide range with fine granularity. Our chip measurements in 28nm FD-SOI technology show that DynOR provides an average speedup of 19% in program execution over a wide range of operating conditions, with a peak speedup for certain applications of up to 41%. Furthermore, this speedup can be traded off against energy, to reduce the chip power consumption for a typical die by up to 15%, compared to a static clocking scheme based on worst case excitation

    TamaRISC-CS: An Ultra-Low-Power Application-Specific Processor for Compressed Sensing

    Get PDF
    Compressed sensing (CS) is a universal technique for the compression of sparse signals. CS has been widely used in sensing platforms where portable, autonomous devices have to operate for long periods of time with limited energy resources. Therefore, an ultra-low-power (ULP) CS implementation is vital for these kind of energy-limited systems. Sub-threshold (sub-VT ) operation is commonly used for ULP computing, and can also be combined with CS. However, most established CS implementations can achieve either no or very limited benefit from sub-VT operation. Therefore, we propose a sub-VT application-specific instruction-set processor (ASIP), exploiting the specific operations of CS. Our results show that the proposed ASIP accomplishes 62x speed-up and 11.6x power savings with respect to an established CS implementation running on the baseline low-power processor

    1 Mbps coherent one-way QKD with dense wavelength division multiplexing and hardware key distillation

    Get PDF
    We present the latest results obtained with a quantum cryptography prototype based on a coherent-one way quantum key distribution (QKD) scheme. To support its continuous high rate secret key generation we developed different low-noise single photon detectors for telecom wavelength based on a sine gating and low-pass-filtering technique, as well as a negative feedback APD in an active hold-off circuit. A newly developed hardware distillation engine allows for continuous operation of secret key distribution up to 1 Mbps. We also present results of our system in a DWDM (dense wavelength-division multiplexing) configuration where only one single fiber is needed to interconnect Alice’ and Bob’s systems. The final prototype is fully compatible to serve a high-speed encryption device developed in parallel which provides encrypted communication of up to 100 Gbps

    Optimasi Portofolio Resiko Menggunakan Model Markowitz MVO Dikaitkan dengan Keterbatasan Manusia dalam Memprediksi Masa Depan dalam Perspektif Al-Qur`an

    Full text link
    Risk portfolio on modern finance has become increasingly technical, requiring the use of sophisticated mathematical tools in both research and practice. Since companies cannot insure themselves completely against risk, as human incompetence in predicting the future precisely that written in Al-Quran surah Luqman verse 34, they have to manage it to yield an optimal portfolio. The objective here is to minimize the variance among all portfolios, or alternatively, to maximize expected return among all portfolios that has at least a certain expected return. Furthermore, this study focuses on optimizing risk portfolio so called Markowitz MVO (Mean-Variance Optimization). Some theoretical frameworks for analysis are arithmetic mean, geometric mean, variance, covariance, linear programming, and quadratic programming. Moreover, finding a minimum variance portfolio produces a convex quadratic programming, that is minimizing the objective function ðð¥with constraintsð ð 𥠥 ðandð´ð¥ = ð. The outcome of this research is the solution of optimal risk portofolio in some investments that could be finished smoothly using MATLAB R2007b software together with its graphic analysis

    Search for heavy resonances decaying to two Higgs bosons in final states containing four b quarks

    Get PDF
    A search is presented for narrow heavy resonances X decaying into pairs of Higgs bosons (H) in proton-proton collisions collected by the CMS experiment at the LHC at root s = 8 TeV. The data correspond to an integrated luminosity of 19.7 fb(-1). The search considers HH resonances with masses between 1 and 3 TeV, having final states of two b quark pairs. Each Higgs boson is produced with large momentum, and the hadronization products of the pair of b quarks can usually be reconstructed as single large jets. The background from multijet and t (t) over bar events is significantly reduced by applying requirements related to the flavor of the jet, its mass, and its substructure. The signal would be identified as a peak on top of the dijet invariant mass spectrum of the remaining background events. No evidence is observed for such a signal. Upper limits obtained at 95 confidence level for the product of the production cross section and branching fraction sigma(gg -> X) B(X -> HH -> b (b) over barb (b) over bar) range from 10 to 1.5 fb for the mass of X from 1.15 to 2.0 TeV, significantly extending previous searches. For a warped extra dimension theory with amass scale Lambda(R) = 1 TeV, the data exclude radion scalar masses between 1.15 and 1.55 TeV

    Measurement of the top quark mass using charged particles in pp collisions at root s=8 TeV

    Get PDF
    Peer reviewe
    corecore